Skip to content

Phase 1c: Bias-corrected local-linear CI (CCT 2014)#340

Merged
igerber merged 8 commits intomainfrom
did-no-untreated
Apr 20, 2026
Merged

Phase 1c: Bias-corrected local-linear CI (CCT 2014)#340
igerber merged 8 commits intomainfrom
did-no-untreated

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 19, 2026

Summary

  • Port nprobust's lprobust() single-eval-point path (lprobust.R:177-246) into diff_diff/_nprobust_port.py as lprobust() + LprobustResult. Uses the Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix Q.q to compute classical and bias-corrected point estimates along with naive and robust standard errors in a single pass.
  • Add public wrapper bias_corrected_local_linear() + BiasCorrectedFit in diff_diff/local_linear.py. Returns the μ-scale CI [tau.bc ± z_{1-α/2} * se.rb] for paper Equation 8. Auto-bandwidth path delegates to mse_optimal_bandwidth and matches nprobust's rho=1 default (b = h).
  • Refactor: extract shared _validate_had_inputs helper so Phase 1b and Phase 1c enforce identical HAD-scope input rules (empty/non-finite/negative-dose/off-support-boundary/mass-point/Design 1' plausibility).
  • Golden generator benchmarks/R/generate_nprobust_lprobust_golden.R + JSON on 5 DGPs (uniform, Beta(2,2), half-normal, clustered, shifted-boundary); R's z = qnorm(1-α/2) exported so Python consumes the exact critical value, eliminating ppf/qnorm ULP drift on CI bounds.

Methodology references (required if estimator / math changes)

  • Method name(s): Calonico-Cattaneo-Titiunik (2014) robust bias correction for local-polynomial regression; Calonico-Cattaneo-Farrell (2018) MSE-optimal bandwidth (consumed from Phase 1b); de Chaisemartin, Ciccia, D'Haultfoeuille, Knau (2026) Equation 8 (mu-scale CI).
  • Paper / source link(s): paper review at docs/methodology/papers/dechaisemartin-2026-review.md; nprobust 0.5.0 (SHA 36e4e53), R/nprobust/R/lprobust.R:177-246.
  • Any intentional deviations from the source (and why): (a) vce in {"hc2","hc2_bm"} combined with cluster= raises ValueError; nprobust silently accepts the combination but it is not a well-defined estimator (hc2/hc3 assume independent observations). (b) Auto mode uses b = h (matching nprobust's rho=1 default) rather than Phase 1b's distinct b_mse; the paper itself uses a single h*_G throughout Equation 8, and b_mse is still surfaced via bandwidth_diagnostics for inspection. (c) Clustered golden DGP 4 uses manual h=b=0.3 to sidestep an nprobust-internal singleton-cluster shape bug in lpbwselect.mse.dpi's pilot fits; the Python port has no equivalent bug. All three deviations are documented in docs/methodology/REGISTRY.md under the Phase 1c entry.

Validation

  • Tests added/updated: tests/test_nprobust_port.py::TestLprobustSingleEval (8 port-level tests exercising the CCT-2014 math with R-supplied bandwidths); tests/test_bias_corrected_lprobust.py (29 wrapper-level tests: parity × 5 DGPs, CI behavior, full input contract, parameter interactions, end-to-end with Phase 1b selector, validator idempotence). Parity at tiered tolerances per docs/methodology/REGISTRY.md Phase 1c entry: atol=1e-12 on tau/se outputs, atol=1e-13 on CI bounds (R's z exported in golden JSON); clustered DGP 4 hits bit-parity at atol=1e-14.
  • Backtest / simulation / notebook evidence (if applicable): N/A - estimator-math PR without user-facing tutorials. Phase 2 (estimator class) will add notebook evidence.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

Port nprobust::lprobust's single-eval-point path (lprobust.R:177-246) as
the foundation for paper Equation 8:

- diff_diff/_nprobust_port.py: add `lprobust()` + `LprobustResult`. Uses
  the Calonico-Cattaneo-Titiunik (2014) bias-combined design matrix `Q.q`
  to produce classical and bias-corrected point estimates along with naive
  and robust (CCT 2014) standard errors in a single pass.
- diff_diff/local_linear.py: add `bias_corrected_local_linear()` +
  `BiasCorrectedFit`. Public wrapper returns the mu-scale CI
  `[tau.bc +/- z_{1-alpha/2} * se.rb]`. Auto-bandwidth path delegates to
  `mse_optimal_bandwidth` and honors nprobust's rho=1 default (b = h).
  Also extracts shared `_validate_had_inputs` helper from Phase 1b.
- benchmarks/R/generate_nprobust_lprobust_golden.R + golden JSON: 5 DGPs
  (Uniform, Beta(2,2), half-normal, clustered, shifted-boundary); R's
  z = qnorm(1-alpha/2) exported so Python skips ppf and matches bit-wise
  on CI arithmetic.
- Tests: TestLprobustSingleEval (8 port-level) + test_bias_corrected_lprobust
  (29 wrapper-level). Tiered tolerances per plan: 1e-12 on tau/se,
  1e-13 on CI bounds; clustered DGP 4 hits bit-parity (1e-14).

Known deviations from nprobust (in REGISTRY.md): hc2/hc3 + cluster
raises (nprobust silently accepts); clustered DGP 4 uses manual h=b=0.3
to sidestep an nprobust-internal singleton-cluster bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⛔ Blocker — one P0 inference bug and one P1 methodology/parameter-propagation bug remain in the new Phase 1c path.

Executive Summary

  • The new lprobust port and wrapper mostly line up with the Phase 1c registry entry, and the documented deviations (b = h in auto mode, rejecting hc2/hc3 + cluster, clustered DGP 4 workaround) are correctly documented and therefore non-blocking.
  • P0: bias_corrected_local_linear() builds CI bounds directly from tau_bc ± z * se_rb with no NaN-safe gate, so degenerate exact-fit cases can return finite or infinite CIs when se_robust is invalid.
  • P1: the auto-bandwidth path silently ignores user cluster, vce, and nnmatch settings. It always selects h/b with the Phase 1b unclustered vce="nn", nnmatch=3 selector, then applies the requested settings only in the final lprobust() call.
  • The new tests cover parity and many input-contract branches, but they do not exercise the broken auto-parameter interactions or zero-SE CI handling.
  • The TODO items added in this PR are appropriately tracked tech debt and are not blockers.
  • I reviewed this statically from the diff and surrounding code; I did not run the suite in this sandbox because the Python environment here is missing numpy.

Methodology

  • Severity: P0. Impact: The new Equation 8 CI path returns misleading inference when se_robust is zero or non-finite. lprobust() can legitimately produce se_rb == 0 in exact-fit cases because the residual vector can collapse to zero before the sandwich step, and the wrapper then returns a finite zero-width CI instead of (NaN, NaN). That violates the repo’s established inference contract for invalid SEs and is silent bad statistical output. See diff_diff/_nprobust_port.py:L1257-L1271, diff_diff/local_linear.py:L1128-L1135, and diff_diff/utils.py:L175-L209. Concrete fix: derive the CI through safe_inference(result.tau_bc, result.se_rb, alpha=alpha)[2] or an equivalent all-or-nothing NaN gate, and add regression coverage for constant-y / exact-fit and other invalid-SE cases.
  • Severity: P1. Impact: The affected methods are the CCT (2014) robust bias-corrected CI plus the CCF (2018) auto-bandwidth stage. In auto mode, bias_corrected_local_linear() calls mse_optimal_bandwidth() without forwarding cluster, vce, or nnmatch, while that selector is hard-coded to cluster=None, vce="nn", and nnmatch=3. The later lprobust() call does use the user-requested settings, so clustered fits or non-nn / non-default-nnmatch fits are using bandwidths chosen for a different estimator than the one reported. That is an undocumented methodology deviation and an incomplete parameter-propagation bug. See diff_diff/local_linear.py:L1078-L1086, diff_diff/local_linear.py:L1112-L1125, and diff_diff/local_linear.py:L797-L810. Concrete fix: in auto mode, call diff_diff._nprobust_port.lpbwselect_mse_dpi directly with cluster=cluster_arr, vce=vce, and nnmatch=nnmatch, or explicitly reject unsupported auto combinations until that threading is implemented and validated.
  • Severity: P3. Impact: The b = h auto-mode choice, the hc2/hc3 + cluster rejection, and the manual clustered DGP 4 workaround are explicitly documented in the methodology registry, so they are not defects for this PR. See docs/methodology/REGISTRY.md:L2315-L2317. Concrete fix: none.

Code Quality

No additional findings beyond the inference bug above.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity: P3. Impact: The new TODO rows correctly track deferred parity expansion for additional kernels / HC VCE modes plus later weights= / multi-eval support and clustered-auto parity work, so those items are non-blocking deferred work rather than review blockers. See TODO.md:L89-L93. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

  • Severity: P2. Impact: The new suite does not cover the two failing surfaces above. It tests clustered behavior only with manual h/b, and the auto end-to-end test uses only the default unclustered vce="nn", nnmatch=3 path; there is also no regression test for zero/non-finite se_robust CI propagation. That left both the auto-parameter mismatch and the invalid-SE CI bug unguarded. See tests/test_bias_corrected_lprobust.py:L333-L387. Concrete fix: add tests for cluster+auto, vce="hc1"+auto, nnmatch != 3 + auto, and exact-fit/constant-y cases that should yield ci_low = ci_high = NaN.

Path to Approval

  1. Make the Phase 1c CI construction NaN-safe so any se_robust <= 0 or non-finite se_robust yields (NaN, NaN) rather than a finite or infinite CI, and add a regression test that exercises that path.
  2. Fix auto-bandwidth propagation so cluster, vce, and nnmatch are honored during bandwidth selection, or reject unsupported auto combinations explicitly. Add regression tests for cluster+auto, non-nn vce+auto, and non-default nnmatch+auto.

P0: bias_corrected_local_linear now routes the CI through
`safe_inference()` so degenerate cases with `se_robust <= 0` or
non-finite `se_robust` (e.g., exact-fit / constant-y) return
`(NaN, NaN)` rather than a misleading zero-width or infinite CI.
Matches the repo-wide inference contract (CLAUDE.md Key Design
Pattern #6).

P1: Auto-bandwidth path now calls `lpbwselect_mse_dpi` directly
with `cluster`, `vce`, and `nnmatch` forwarded. Previously it went
through `mse_optimal_bandwidth` which hard-codes unclustered / nn /
nnmatch=3, silently mismatching the downstream `lprobust` fit's
reported estimator.

Tests added: TestNaNSafeCI (constant-y + near-zero-SE) and
TestAutoBandwidthForwardsParameters (cluster+auto, vce='hc1'+auto,
nnmatch=5+auto), all asserting the selected bandwidth changes when
the corresponding parameter changes (catches silent fallback).

Also: suppress spurious BLAS FPE warnings in lprobust_bw's hc/hc2/hc3
branch (numpy issue #21432 pattern), newly reachable via the
wired-through vce='hc1' auto-bandwidth path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 237a80098e62b21c956ec5db755f5bcfc04ccbc0


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains in the new clustered Phase 1c manual-bandwidth path.

Executive Summary

  • P1: The new clustered bias_corrected_local_linear(..., h=..., cluster=...) / lprobust(..., cluster=...) path only rejects missing cluster IDs for floating dtypes. Object-dtype None / np.nan can bypass validation and then contaminate clustered variance aggregation.
  • The two blocker findings from the previous AI review are fixed: auto mode now forwards cluster / vce / nnmatch into the selector, and CI construction now goes through safe_inference. See diff_diff/local_linear.py:L1077-L1103, diff_diff/local_linear.py:L1171-L1185, and diff_diff/utils.py:L175-L209.
  • The Phase 1c deviations called out in the PR (b = h in auto mode, rejecting hc2/hc3 + cluster, and the clustered DGP 4 workaround) are documented in the methodology registry, so they are non-blocking P3 informational items. See docs/methodology/REGISTRY.md:L2316-L2317.
  • The new TODO rows correctly track deferrable follow-up work around additional parity coverage and out-of-scope extensions. See TODO.md:L89-L93.
  • Review was static only; I could not run the suite in this sandbox because numpy is unavailable here.

Methodology

  • Severity: P1
    Impact: The affected method is the clustered CCT (2014) robust bias-corrected local-linear variance path. The new wrapper and new lprobust port only reject missing cluster IDs when cluster.dtype is floating, but the clustered sandwich later groups on np.unique(cluster). That means object-dtype clusters carrying None or object-np.nan can either be treated as real clusters or fail later with a non-targeted error, instead of honoring the documented “reject missing cluster IDs” contract. On the silent branch, that can misstate clustered se_cl / se_rb with no warning. See diff_diff/local_linear.py:L1053-L1065, diff_diff/_nprobust_port.py:L1127-L1139, and the clustered aggregation at diff_diff/_nprobust_port.py:L341-L350. The selector path already has the correct cross-dtype missing-ID guard at diff_diff/_nprobust_port.py:L675-L713.
    Concrete fix: Reuse the selector’s has_missing logic in both bias_corrected_local_linear and lprobust, so None and object-dtype np.nan are rejected before any clustered variance computation.

  • Severity: P3
    Impact: The documented Phase 1c deviations/workarounds are properly recorded in the registry and therefore are not defects for this PR. See docs/methodology/REGISTRY.md:L2316-L2317.
    Concrete fix: None.

Code Quality

  • No additional findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3
    Impact: The remaining Phase 1c follow-ups are explicitly tracked in TODO.md rather than left implicit: additional kernel parity, HC-mode parity, weights=, multi-eval support, and clustered-auto parity once the upstream nprobust issue is resolved. See TODO.md:L89-L93.
    Concrete fix: None in this PR.

Security

  • No findings.

Documentation/Tests

  • Severity: P2
    Impact: The new regression suite only checks float NaN cluster rejection on the manual-bandwidth wrapper path, so it misses the object-dtype missing-ID case that allows the P1 above through. See tests/test_bias_corrected_lprobust.py:L313-L319.
    Concrete fix: Add wrapper-level and direct-port tests using cluster=np.array([...], dtype=object) with None and object-np.nan, asserting a targeted ValueError.

Path to Approval

  1. Extend missing-cluster validation in bias_corrected_local_linear and diff_diff._nprobust_port.lprobust to reject None / object-dtype np.nan, not just floating NaN/Inf.
  2. Add regression coverage for the manual-bandwidth wrapper path and direct lprobust path on object-dtype cluster arrays with missing sentinels, and assert the targeted validation error before clustered variance computation.

CI review follow-up: the floating-dtype-only missing-ID guard in
`bias_corrected_local_linear` and `lprobust` let object-dtype
arrays with `None` / object `np.nan` sentinels bypass validation.
The downstream `lprobust_vce` cluster loop would then group on
`np.unique`, treating the sentinel as a real cluster and silently
misstating clustered SE.

Extract the dtype-agnostic `_cluster_has_missing` helper already
used inside `lpbwselect_mse_dpi` and apply it at all three entry
points: the selector, the port-level `lprobust`, and the public
`bias_corrected_local_linear` wrapper.

Regression tests: object-dtype cluster arrays with None and with
np.nan sentinels raise a targeted ValueError at both the wrapper
(test_bias_corrected_lprobust) and the port (test_nprobust_port)
entry points.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: bee6805c663aef540fce95cda7152b9708181bd0


Overall Assessment

Looks good — this re-review resolves the prior P1 on object-dtype missing cluster IDs, keeps the earlier CI/inference fix in place, and I did not find any new unmitigated P0/P1 issues in the changed files. Review was static only because this sandbox does not have pytest or numpy.

Executive Summary

Methodology

Affected methods: Calonico-Cattaneo-Titiunik (2014) robust bias correction via lprobust, Calonico-Cattaneo-Farrell (2018) MSE-DPI bandwidth selection in auto mode, and the de Chaisemartin et al. (2026) Equation 8 μ-scale CI wrapper. I did not find any unmitigated methodology defects in the changed implementation; the estimator logic is source-mapped in diff_diff/_nprobust_port.py:L1038-L1067, and the wrapper behavior matches the registry notes in docs/methodology/REGISTRY.md:L2317-L2318.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • P3 Impact: Remaining Phase 1c coverage gaps are explicitly tracked in TODO.md for non-Epanechnikov kernel parity, HC-mode parity, weights, multi-eval support, and clustered auto-bandwidth parity. Because these are tracked deferred items rather than silent correctness bugs, they are non-blocking. Concrete fix: None in this PR; follow the tracked TODO items. See TODO.md:L89-L93.

Security

No findings.

Documentation/Tests

Docstring note from AI review. The wrapper's docstring still said
auto mode chooses both h and b via `mse_optimal_bandwidth`, but the
implementation now calls `_nprobust_port.lpbwselect_mse_dpi` directly
(to forward cluster / vce / nnmatch) and then sets `b = h` per
nprobust's rho=1 default. Update the docstrings to reflect that.

Behavior is unchanged; this is a comment-only clarification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 20, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7672f8d3b2dbb7b2a32afa6e793b411367e40085


Overall Assessment

Looks good — this re-review resolves the prior P1 surfaces I checked, and I did not find any unmitigated P0/P1 issues in the changed files. The only remaining item is a P3 documentation/test-contract mismatch. This was a static review only because the sandbox does not have pytest or scipy.

Executive Summary

  • The affected methods are the CCT (2014) robust bias-corrected local-polynomial estimator, the CCF (2018) MSE-DPI selector used in auto mode, and the de Chaisemartin et al. (2026) Equation 8 μ-scale CI wrapper.
  • The earlier missing-cluster-ID issue appears fixed across the wrapper, selector, and direct port via the shared dtype-agnostic _cluster_has_missing() guard.
  • The earlier NaN/CI issue also appears fixed: bias_corrected_local_linear() now routes CI construction through safe_inference(), so invalid or zero se_robust values should propagate to NaN CI endpoints instead of misleading finite output.
  • Auto-bandwidth mode now forwards cluster, vce, and nnmatch directly into lpbwselect_mse_dpi(), avoiding a selector/fit methodology mismatch.
  • The three non-R Phase 1c behaviors called out in the PR are documented in the Methodology Registry, so they are informational rather than defects under the review rules.
  • One P3 remains: several new docs/comments say Python “consumes R’s z directly,” but the changed code still computes production CI through SciPy-based safe_inference(), and the parity tests compare stored CI bounds rather than reconstructing from the exported z.

Methodology

I did not find any unmitigated P0/P1 methodology defects in the changed estimator path. The prior cluster-missing, NaN-safe CI, and auto-forwarding findings appear addressed in diff_diff/_nprobust_port.py:L87-L111, diff_diff/_nprobust_port.py:L1138-L1154, diff_diff/local_linear.py:L1063-L1083, diff_diff/local_linear.py:L1189-L1203, diff_diff/local_linear.py:L1103-L1121, tests/test_bias_corrected_lprobust.py:L313-L342, tests/test_bias_corrected_lprobust.py:L430-L518, and tests/test_nprobust_port.py:L591-L620.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • P3 Impact: The remaining Phase 1c coverage gaps I would normally worry about here are explicitly tracked in TODO.md: non-Epanechnikov kernel parity, HC-mode parity, weights=, multi-eval support, and clustered auto-bandwidth parity. Under the review rules these tracked items are non-blocking. Concrete fix: None in this PR; follow the tracked TODO items. References: TODO.md:L89-L93

Security

No findings.

Documentation/Tests

Review note: I could not execute the new tests in this sandbox because pytest is not installed, and I also could not import SciPy here.

…tion)

P3 doc-accuracy note from AI review: the registry, R generator, and
local_linear.py comments said the Python side "consumes R's z
directly" but the wrapper actually computes its own z via
scipy.stats.norm.ppf inside safe_inference(). The golden JSON's z
field is an audit/reference export so a reviewer can confirm R's
qnorm and Python's scipy.stats.norm.ppf agree to machine precision.
The parity harness compares Python-computed CI bounds to R-computed
CI bounds (not R's z fed into Python's arithmetic).

Behavior is unchanged; only doc phrasing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 20, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 12a8efea071535d7dd3798ca0bf8981af9980c4f


Overall Assessment

Looks good — I did not find any unmitigated P0/P1 issues in the changed files. The affected methods are the CCT (2014) robust bias-corrected local-polynomial estimator, the CCF (2018) MSE-DPI selector used in auto mode, and the de Chaisemartin et al. (2026) Equation 8 μ-scale CI wrapper as recorded in docs/methodology/REGISTRY.md:L2316-L2318. Prior re-review concerns around missing cluster IDs, NaN-safe CI construction, and auto-bandwidth parameter forwarding appear resolved in diff_diff/_nprobust_port.py:L87-L111, diff_diff/local_linear.py:L1063-L1120, diff_diff/local_linear.py:L1189-L1203, and diff_diff/utils.py:L175-L205.

Executive Summary

  • The changed estimator path is source-mapped to nprobust::lprobust single-eval logic and aligns with the Phase 1c registry entry in docs/methodology/REGISTRY.md:L2317-L2318.
  • The three non-R behaviors in scope are documented and correctly implemented, so they are informational rather than defects: b = h in auto mode, rejecting cluster with vce in {"hc2","hc3"}, and manual h=b=0.3 for the clustered golden DGP.
  • The earlier cluster-missing and NaN/CI issues appear fixed via the shared _cluster_has_missing() guard and safe_inference() CI path in diff_diff/_nprobust_port.py:L87-L111, diff_diff/local_linear.py:L1063-L1083, and diff_diff/local_linear.py:L1189-L1203.
  • Auto mode now forwards cluster, vce, and nnmatch into the selector, avoiding the prior selector/fit mismatch in diff_diff/local_linear.py:L1095-L1120.
  • Remaining Phase 1c parity gaps are properly tracked in TODO.md:L89-L93, so they are non-blocking under the review rules.
  • One P3 remains: the changed parity/audit artifacts are still internally inconsistent about CI critical-value provenance and the exact parity contract.

Review note: this was a static review. I could not execute the new parity tests in this sandbox because the available Python environment is missing numpy.

Methodology

No unmitigated P0/P1 methodology defects found in the changed estimator path.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity: P3
    Impact: Remaining Phase 1c parity/feature gaps are explicitly tracked in TODO.md: alternate kernels, HC vce modes, weights=, multi-eval support, and clustered auto-bandwidth parity. Under the review rules these are non-blocking. See TODO.md:L89-L93.
    Concrete fix: None in this PR; follow the tracked TODO items.

Security

No findings.

Documentation/Tests

P3 follow-up from AI review. Three small inconsistencies to resolve:

1. `bias_corrected_local_linear` docstring still described tau_cl/se_cl
   as bit-parity and said Python consumes R's z directly. The actual
   contract is atol=1e-12 on all four scalars (DGP 1-3) and the
   wrapper computes its own z via scipy.stats.norm.ppf; R's qnorm is
   stored in the JSON for audit only. Docstring updated to match.

2. Committed golden JSON metadata still had the old "consume R's
   critical value directly" string because the generator was edited
   without regenerating. Regenerated so JSON metadata matches the
   corrected audit-export wording in the R script.

3. Parity tests for DGP 4 and DGP 5 did not assert CI bounds. Added
   ci_low / ci_high assertions at the same tolerance as the
   corresponding se_rb assertion (bit-parity for DGP 4, 1e-12 for
   DGP 5), so the audit surface matches what the registry states.

Behavior unchanged; tests strengthened and docs aligned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 20, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7eeb42ac95be199eb937c4e827f9bb382135b6ac


Overall Assessment
⚠️ Needs changes. I did not find a P0, but there is one unmitigated P1 in the new Phase 1c variance path.

Executive Summary

  • The prior P3 around CI critical-value provenance looks largely resolved: the registry, wrapper notes, generator, and committed JSON now agree that production CI uses SciPy’s norm.ppf and stores R’s qnorm only for audit.
  • One P1 methodology issue remains in the CCT (2014) port: lprobust() reuses the p-fit HC2/HC3 leverage for the q-fit robust-bias-correction residual path, which can misstate se_rb and the de Chaisemartin et al. (2026) Equation 8 CI for unclustered vce="hc2" / "hc3".
  • The Phase 1c deviations that are actually documented in the Methodology Registry are non-blocking: auto mode uses b = h, clustered hc2/hc3 is rejected, and clustered DGP 4 uses manual h=b=0.3.
  • The new TODO.md entries correctly track deferrable Phase 1c follow-ups and are non-blocking under project policy.
  • Static review only: this sandbox is missing numpy, so I could not execute the new parity tests.

Methodology

  • Severity: P1. Impact: Affected methods are the Calonico-Cattaneo-Titiunik (2014) robust bias-corrected local-polynomial estimator in lprobust() and the de Chaisemartin et al. (2026) Equation 8 wrapper that consumes se_rb. In diff_diff/_nprobust_port.py:L1265-L1288, the code computes only one HC2/HC3 leverage vector from the p-fit (R_p, invG_p, W_h) and passes it to both res_h and res_b, even though res_b is built from q-fit predictions (predicts_q). Because diff_diff/_nprobust_port.py:L322-L330 applies HC2/HC3 scaling directly from hii, the robust-bias-correction path is combining q-fit residuals with p-fit leverage. That silently misstates V_Y_bc, se_rb, and therefore the public CI for exposed unclustered vce="hc2" / "hc3" modes in diff_diff/local_linear.py:L953-L965 and diff_diff/local_linear.py:L993-L995. The same module’s bandwidth code already uses stage-matched leverage, which is the pattern missing here: diff_diff/_nprobust_port.py:L480-L500 and diff_diff/_nprobust_port.py:L543-L560. Concrete fix: compute separate hii_p and hii_q, pass hii_p to res_h and hii_q to res_b, and add regression tests for unclustered hc2/hc3 that assert se_rb and CI against R goldens or an equivalent fixed-bandwidth reference.
  • Severity: P3. Impact: The Phase 1c behaviors that actually deviate from the R reference are documented in the registry, so they are informational rather than defects: auto mode sets b = h, clustered hc2/hc3 is rejected, and clustered DGP 4 uses manual h=b=0.3. See docs/methodology/REGISTRY.md:L2317-L2318. Concrete fix: None.

Code Quality
No findings.

Performance
No findings.

Maintainability
No findings.

Tech Debt

  • Severity: P3. Impact: The new Phase 1c TODO rows correctly track deferred parity/features for additional kernels, HC VCE goldens, weights=, multi-eval support, and clustered-auto parity, so these are non-blocking under the review rules. They do not mitigate the HC2/HC3 leverage bug above, because incorrect SE math is not deferrable. See TODO.md:L89-L93. Concrete fix: None in this PR.

Security
No findings.

Documentation/Tests

Path to Approval

  1. In diff_diff._nprobust_port.lprobust(), compute a separate q-fit HC2/HC3 leverage vector and use it for res_b; keep the p-fit leverage only on res_h.
  2. Add positive-path tests for unclustered vce="hc2" and vce="hc3" in both tests/test_nprobust_port.py and tests/test_bias_corrected_lprobust.py that assert se_rb, ci_low, and ci_high against deterministic goldens or a direct reference calculation.

Address CI review P1 on HC2/HC3 leverage. nprobust's lprobust reuses
the p-fit hat-matrix leverage for the q-fit residual path
(lprobust.R:229-241). The Python port matches R exactly, but that R
behavior deserves a derivation and a dedicated golden anchor before
the public wrapper can advertise CCT-2014 conformance on hc-mode
paths. The Phase 1c plan always said hc0-3 were "exposed but not
golden-tested"; this commit restricts the public surface to what
actually has a parity anchor.

Changes:
- `bias_corrected_local_linear`: `vce != "nn"` now raises
  `NotImplementedError` with a pointer to the port-level
  `diff_diff._nprobust_port.lprobust` for callers who still need the
  broader surface. Docstring updated.
- Tests: swap positive-path hc-mode tests for negative-path raises;
  drop `test_auto_vce_hc1_returns_finite` (now superseded by
  `test_auto_vce_hc1_rejected_in_phase_1c`); drop the cluster+hc2/hc3
  tests (now covered by the plain hc2/hc3 rejection tests).
- REGISTRY.md: swap "Deviation from R" wording for a clearer
  "public-API surface restriction" note explaining the R-side hii
  reuse and the deferred derivation.
- TODO.md: upgrade the hc-mode expansion TODO from Low to Medium and
  document the q-fit leverage derivation as the gating work.

Also fix the P3 tolerance-doc inconsistency: registry/docstring now
say atol=1e-12 on all six scalars across unclustered DGPs (tests
actually assert 1e-12, with DGP 1 / DGP 3 landing closer to 1e-13
in practice).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 20, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2c48e60e7efd8524acd69b5aeab939e688fadd89


Overall Assessment

✅ Looks good

Executive Summary

  • The prior P1 on exposed HC-mode robust-bias-correction SEs is effectively resolved for the public Phase 1c API: bias_corrected_local_linear() now rejects non-"nn" vce, the registry documents that restriction, and HC exposure is explicitly deferred in TODO.md.
  • On the supported path, the PR matches the paper/registry design: it implements the μ-scale Equation 8 CI, uses safe_inference() for NaN-consistent CI gating, and forwards auto-bandwidth inputs into the same selector the final fit uses.
  • The shared HAD validator now covers the important boundary, mass-point, negative-dose, and missing-cluster edge cases on both wrapper and port entry points.
  • Remaining Phase 1c limitations are documented or tracked rather than silently shipped.
  • Static review only: this environment does not have numpy, scipy, or pytest, so I could not execute the new parity tests.

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity P3 (informational). Impact: The remaining Phase 1c follow-ups are properly tracked: extra-kernel parity, public HC-mode exposure, weights=, multi-eval support, and clustered auto-bandwidth parity. Under the stated review policy these tracked items are non-blocking. Concrete fix: None in this PR. TODO.md:L89-L93

Security

  • No findings.

Documentation/Tests

P3 follow-up from AI review. The class-level scope note still claimed
HC0-3 were "exposed here", and the Raises section listed the old
`ValueError` for `vce='hc2'/'hc3'` with cluster. Both are stale: the
wrapper now raises `NotImplementedError` for any `vce != "nn"`
regardless of cluster. Updated the scope note and Raises section to
reflect the current behavior.

Behavior unchanged; docstring-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 20, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5cd60516a9d9b6288ebe26ae2f80c2a3bc2b9d3d


Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. This re-review’s prior documentation mismatch is resolved, and the remaining Phase 1c limitations are either documented in the Methodology Registry or explicitly tracked in TODO.md.

Executive Summary

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity P3 (informational). Impact: The remaining Phase 1c limitations are explicitly tracked in TODO.md, so they are non-blocking per policy. Concrete fix: None in this PR; follow the tracked items for triangular/uniform kernel parity, public HC exposure, weights=, multi-eval support, and clustered auto-bandwidth parity. TODO.md:89

Security

  • No findings.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 20, 2026
@igerber igerber merged commit 9c787f0 into main Apr 20, 2026
23 of 24 checks passed
@igerber igerber deleted the did-no-untreated branch April 20, 2026 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant